Designer | ARM Holdings |
---|---|
Bits | 32-bit |
Introduced | 1983 |
Version | ARMv7 |
Design | RISC |
Type | Register-Register |
Encoding | Fixed |
Branching | Condition code |
Endianness | Bi |
Extensions | NEON, Thumb, Jazelle, VFP |
Registers | |
16 |
The ARM is a 32-bit reduced instruction set computer (RISC) instruction set architecture (ISA) developed by ARM Holdings. It was known as the Advanced RISC Machine, and before that as the Acorn RISC Machine. The ARM architecture is the most widely used 32-bit ISA in terms of numbers produced.[1][2] They were originally conceived as a processor for desktop personal computers by Acorn Computers, a market now dominated by the x86 family used by IBM PC compatible computers. The relative simplicity of ARM processors made them suitable for low power applications. This has made them dominant in the mobile and embedded electronics market as relatively low cost and small microprocessors and microcontrollers.
As of 2007, about 98 percent of the more than one billion mobile phones sold each year use at least one ARM processor.[3] As of 2009, ARM processors account for approximately 90% of all embedded 32-bit RISC processors. ARM processors are used extensively in consumer electronics, including PDAs, mobile phones, digital media and music players, hand-held game consoles, calculators and computer peripherals such as hard drives and routers.
The ARM architecture is licensable. Companies that are current or former ARM licensees include Alcatel-Lucent, Apple Inc., Atmel, Broadcom, Cirrus Logic, Digital Equipment Corporation, Freescale, Intel (through DEC), LG, Marvell Technology Group, Microsoft, NEC, Nuvoton, NVIDIA, NXP (previously Philips), Oki, Qualcomm, Samsung, Sharp, STMicroelectronics, Symbios Logic, Texas Instruments, VLSI Technology, Yamaha and ZiiLABS.
ARM processors are developed by ARM and by ARM licensees. Prominent examples of ARM Holdings ARM processor families include the ARM7, ARM9, ARM11 and Cortex. Examples of ARM processors developed by major licensees include DEC StrongARM, Freescale i.MX, Marvell (formerly Intel) XScale, Nintendo, NVIDIA Tegra, ST-Ericsson Nomadik, Qualcomm Snapdragon, and the Texas Instruments OMAP product line.
Contents
|
After achieving some success with the BBC Micro computer, Acorn Computers Ltd considered how to move on from the relatively simple MOS Technology 6502 processor to address business markets like the one that would soon be dominated by the IBM PC, launched in 1981. The Acorn Business Computer (ABC) plan required a number of second processors to be made to work with the BBC Micro platform, but processors such as the Motorola 68000 and National Semiconductor 32016 were unsuitable, and the 6502 was not powerful enough for a graphics based user interface.
Acorn would need a new architecture, having tested all of the available processors and found them wanting. Acorn then seriously considered designing its own processor, and their engineers came across papers on the Berkeley RISC project. They felt it showed that if a class of graduate students could create a competitive 32-bit processor, then Acorn would have no problem. A trip to the Western Design Center in Phoenix showed Acorn engineers Steve Furber and Sophie Wilson that they did not need massive resources and state-of-the-art R&D facilities.
Wilson set about developing the instruction set, writing a simulation of the processor in BBC Basic that ran on a BBC Micro with a 6502 second processor. It convinced the Acorn engineers that they were on the right track. Before they could go any further, however, they would need more resources. It was time for Wilson to approach Acorn's CEO, Hermann Hauser, and explain what was afoot. Once the go-ahead had been given, a small team was put together to implement Wilson's model in hardware.
The official Acorn RISC Machine project started in October 1983. VLSI Technology, Inc was chosen as silicon partner, since it already supplied Acorn with ROMs and some custom chips. The design was led by Wilson and Furber, with a key design goal of achieving low-latency input/output (interrupt) handling like the MOS Technology 6502 used in Acorn's existing computer designs. The 6502's memory access architecture had allowed developers to produce fast machines without the use of costly direct memory access hardware. VLSI produced the first ARM silicon on 26 April 1985 – it worked first time and came to be known as ARM1 by April 1985.[4] The first "real" production systems named ARM2 were available the following year.
Its first practical application was as a second processor to the BBC Micro, where it was used to develop the simulation software to finish work on the support chips (VIDC, IOC, MEMC) and to speed up the operation of the CAD software used in developing ARM2. Wilson subsequently coded BBC Basic in ARM assembly language, and the in-depth knowledge obtained from designing the instruction set allowed the code to be very dense, making ARM BBC Basic an extremely good test for any ARM emulator. The original aim of a principally ARM-based computer was achieved in 1987 with the release of the Acorn Archimedes.
Such was the secrecy surrounding the ARM CPU project that when Olivetti were negotiating to take a controlling share of Acorn in 1985, they were not told about the development team until after the negotiations had been finalised. In 1992 Acorn once more won the Queen's Award for Technology for the ARM.
The ARM2 featured a 32-bit data bus, a 26-bit address space and sixteen 32-bit registers. Program code had to lie within the first 64 Mbyte of the memory, as the program counter was limited to 26 bits because the top 4 and bottom 2 bits of the 32-bit register served as status flags. The ARM2 was possibly the simplest useful 32-bit microprocessor in the world, with only 30,000 transistors (compare the transistor count with Motorola's six-year older 68000 model with around 70,000 transistors). Much of this simplicity comes from not having microcode (which represents about one-quarter to one-third of the 68000) and, like most CPUs of the day, not including any cache. This simplicity led to its low power usage, while performing better than the Intel 80286.[5] A successor, ARM3, was produced with a 4KB cache, which further improved performance.
In the late 1980s Apple Computer and VLSI Technology started working with Acorn on newer versions of the ARM core. The work was so important that Acorn spun off the design team in 1990 into a new company called Advanced RISC Machines Ltd. For this reason, ARM is sometimes expanded as Advanced RISC Machine instead of Acorn RISC Machine. Advanced RISC Machines became ARM Ltd when its parent company, ARM Holdings plc, floated on the London Stock Exchange and NASDAQ in 1998.[6]
The new Apple-ARM work would eventually turn into the ARM6, first released in early 1992. Apple used the ARM6-based ARM 610 as the basis for their Apple Newton PDA. In 1994, Acorn used the ARM 610 as the main CPU in their Risc PC computers. DEC licensed the ARM6 architecture and produced the StrongARM. At 233 MHz this CPU drew only 1 Watt of power (more recent versions draw far less). This work was later passed to Intel as a part of a lawsuit settlement, and Intel took the opportunity to supplement their aging i960 line with the StrongARM. Intel later developed its own high performance implementation known as XScale which it has since sold to Marvell.
The ARM core has remained largely the same size throughout these changes. ARM2 had 30,000 transistors, while the ARM6 grew to only 35,000. ARM's business has always been to sell IP cores, which licensees use to create microcontrollers and CPUs based on this core. The most successful implementation has been the ARM7TDMI with hundreds of millions sold. The idea is that the Original Design Manufacturer combines the ARM core with a number of optional parts to produce a complete CPU, one that can be built on old semiconductor fabs and still deliver substantial performance at a low cost. Atmel has been a precursor design center in the ARM7TDMI-Based Embedded System.
ARM licensed about 1.6 billion cores in 2005. In 2005, about 1 billion ARM cores went into mobile phones.[7] As of January 2008[update], over 10 billion ARM cores have been built, and iSuppli predicts that 5 billion a year will ship in 2011.[8]
The architecture used in smartphones, personal digital assistants and other handheld devices is anything from ARMv5 in obsolete/low-end devices to ARMv7 in current high-end devices. XScale and ARM926 processors are ARMv5TE, and are now more numerous in high-end devices than the StrongARM, ARM9TDMI and ARM7TDMI based ARMv4 processors, but lower-end devices may use older cores with lower licensing costs. ARMv6 processors represented a step up in performance from standard ARMv5 cores, and are used in some cases, but Cortex processors (ARMv7) now provide faster and more power-efficient options than all those previous generations. Cortex-A targets applications processors, as needed by smartphones that previously used ARM9 or ARM11. Cortex-R targets real-time applications, and Cortex-M targets microcontrollers.
In 2009, some manufacturers introduced netbooks based on ARM architecture CPUs, in direct competition with netbooks based on Intel Atom.[9]
ARM provides a summary of the numerous vendors who implement ARM cores in their design (see the 2003 Line Card). KEIL also provides a somewhat newer nice summary of vendors of ARM based processors. ARM further provides an overview chart displaying a relatively coarse processor lineup with performance and functionality versus capabilities for the more current ARM7, ARM9, ARM11, Cortex-Mx and Cortex-Ax device families.
Family | Architecture Version | Core | Feature | Cache (I/D)/MMU | Typical MIPS @ MHz | In application |
---|---|---|---|---|---|---|
ARM1 | ARMv1 (obsolete) | ARM1 | None | ARM Evaluation System second processor for BBC Micro | ||
ARM2 | ARMv2 (obsolete) | ARM2 | Architecture 2 added the MUL (multiply) instruction | None | 4 MIPS @ 8 MHz 0.33 DMIPS/MHz |
Acorn Archimedes, Chessmachine |
ARMv2a (obsolete) | ARM250 | Integrated MEMC (MMU), Graphics and IO processor. Architecture 2a added the SWP and SWPB (swap) instructions. | None, MEMC1a | 7 MIPS @ 12 MHz | Acorn Archimedes | |
ARM3 | ARMv2a (obsolete) | ARM2a | First use of a processor cache on the ARM. | 4K unified | 12 MIPS @ 25 MHz 0.50 DMIPS/MHz |
Acorn Archimedes |
ARM6 | ARMv3 (obsolete) | ARM60 | v3 architecture first to support addressing 32 bits of memory (as opposed to 26 bits) | None | 10 MIPS @ 12 MHz | 3DO Interactive Multiplayer, Zarlink GPS Receiver |
ARM600 | As ARM60, cache and coprocessor bus (for FPA10 floating-point unit). | 4K unified | 28 MIPS @ 33 MHz | |||
ARM610 | As ARM60, cache, no coprocessor bus. | 4K unified | 17 MIPS @ 20 MHz 0.65 DMIPS/MHz |
Acorn Risc PC 600, Apple Newton 100 series | ||
ARM7 | ARMv3 (obsolete) | ARM700 | 8 KB unified | 40 MHz | Acorn Risc PC prototype CPU card | |
ARM710 | As ARM700 | 8 KB unified | 40 MHz | Acorn Risc PC 700 | ||
ARM710a | As ARM700 | 8 KB unified | 40 MHz 0.68 DMIPS/MHz |
Acorn Risc PC 700, Apple eMate 300 | ||
ARM7100 | As ARM710a, integrated SoC. | 8 KB unified | 18 MHz | Psion Series 5 | ||
ARM7500 | As ARM710a, integrated SoC. | 4 KB unified | 40 MHz | Acorn A7000 | ||
ARM7500FE | As ARM7500, "FE" Added FPA and EDO memory controller. | 4 KB unified | 56 MHz 0.73 DMIPS/MHz |
Acorn A7000+ Network Computer | ||
ARM7TDMI | ARMv4T | ARM7TDMI(-S) | 3-stage pipeline, Thumb | none | 15 MIPS @ 16.8 MHz 63 DMIPS @ 70 MHz |
Game Boy Advance, Nintendo DS, Apple iPod, Lego NXT, Atmel AT91SAM7, Juice Box, NXP Semiconductors LPC2000 and LH754xx, Actel's CoreMP7 |
ARM710T | As ARM7TDMI, cache | 8 KB unified, MMU | 36 MIPS @ 40 MHz | Psion Series 5mx, Psion Revo/Revo Plus/Diamond Mako | ||
ARM720T | As ARM7TDMI, cache | 8 KB unified, MMU with Fast Context Switch Extension | 60 MIPS @ 59.8 MHz | Zipit Wireless Messenger, NXP Semiconductors LH7952x | ||
ARM740T | As ARM7TDMI, cache | MPU | ||||
ARMv5TEJ | ARM7EJ-S | 5-stage pipeline, Thumb, Jazelle DBX, Enhanced DSP instructions | none | |||
StrongARM | ARMv4 | SA-110 | 16 KB/16 KB, MMU | 203 MHz 1.0 DMIPS/MHz |
Apple Newton 2x00 series, Acorn Risc PC, Rebel/Corel Netwinder, Chalice CATS | |
SA-1100 | As SA-110, integrated SoC | 16 KB/8 KB, MMU | 203 MHz | Psion netBook | ||
SA-1110 | As SA-110, integrated SoC | 16 KB/8 KB, MMU | 206 MHz | LART (computer), Intel Assabet, Ipaq H36x0, Balloon2, Zaurus SL-5x00, HP Jornada 7xx, Jornada 560 series, Palm Zire 31 | ||
ARM8 | ARMv4 | ARM810[10] | 5-stage pipeline, static branch prediction, double-bandwidth memory | 8 KB unified, MMU | 84 MIPS @ 72 MHz 1.16 DMIPS/MHz |
Acorn Risc PC prototype CPU card |
ARM9TDMI | ARMv4T | ARM9TDMI | 5-stage pipeline, Thumb | none | ||
ARM920T | As ARM9TDMI, cache | 16 KB/16 KB, MMU with FCSE (Fast Context Switch Extension)[11] | 200 MIPS @ 180 MHz | Armadillo, Atmel AT91RM9200, AT91SAM9, GP32, GP2X (first core), Tapwave Zodiac (Motorola i. MX1), Hewlett-Packard HP-49/50 Calculators, Sun SPOT, Cirrus Logic EP9302, EP9307, EP9312, EP9315, Samsung S3C2442 (HTC TyTN, FIC Neo FreeRunner[12]), Samsung S3C2410 (TomTom navigation devices)[13] | ||
ARM922T | As ARM9TDMI, caches | 8 KB/8 KB, MMU | NXP Semiconductors LH7A40x | |||
ARM940T | As ARM9TDMI, caches | 4 KB/4 KB, MPU | GP2X (second core), Meizu M6 Mini Player[14][15] | |||
ARM9E | ARMv5TE | ARM946E-S | Thumb, Enhanced DSP instructions, caches | variable, tightly coupled memories, MPU | Nintendo DS, Nokia N-Gage, Canon PowerShot A470, Canon EOS 5D Mark II[16], Conexant 802.11 chips, Samsung S5L2010 | |
ARM966E-S | Thumb, Enhanced DSP instructions | no cache, TCMs | ST Micro STR91xF, includes Ethernet[17] | |||
ARM968E-S | As ARM966E-S | no cache, TCMs | NXP Semiconductors LPC2900 | |||
ARMv5TEJ | ARM926EJ-S | Thumb, Jazelle DBX, Enhanced DSP instructions | variable, TCMs, MMU | 220 MIPS @ 200 MHz, | Mobile phones: Sony Ericsson (K, W series); Siemens and Benq (x65 series and newer); LG Arena; Texas Instruments OMAP1710, OMAP1610, OMAP1611, OMAP1612, OMAP-L137, OMAP-L138; Qualcomm MSM6100, MSM6125, MSM6225, MSM6245, MSM6250, MSM6255A, MSM6260, MSM6275, MSM6280, MSM6300, MSM6500, MSM6800; Freescale i.MX21, i.MX27, Atmel AT91SAM9, NXP Semiconductors LPC3000, GPH Wiz, NEC C10046F5-211-PN2-A SoC – undocumented core in the ATi Hollywood graphics chip used in the Wii,[18] Samsung S3C2412 used in Squeezebox Duet's Controller. Squeezebox Radio; NeoMagic MiMagic Family MM6, MM6+, MM8, MTV; Buffalo TeraStation Live (NAS); Telechips TCC7801, TCC7901;ZiiLABS' ZMS-05 system on a chip; Western Digital MyBook I World Edition; Rockchip RK2806 and RK2808. | |
ARMv5TE | ARM996HS | Clockless processor, as ARM966E-S | no caches, TCMs, MPU | |||
ARM10E | ARMv5TE | ARM1020E | 6-stage pipeline, Thumb, Enhanced DSP instructions, (VFP) | 32 KB/32 KB, MMU | ||
ARM1022E | As ARM1020E | 16 KB/16 KB, MMU | ||||
ARMv5TEJ | ARM1026EJ-S | Thumb, Jazelle DBX, Enhanced DSP instructions, (VFP) | variable, MMU or MPU | Western Digital MyBook II World Edition;Conexant so4610 and so4615 ADSL SoC | ||
XScale | ARMv5TE | 80200/IOP310/IOP315 | I/O Processor, Thumb, Enhanced DSP instructions | |||
80219 | 400/600 MHz | Thecus N2100 | ||||
IOP321 | 600 BogoMips @ 600 MHz | Iyonix | ||||
IOP33x | ||||||
IOP34x | 1–2 core, RAID Acceleration | 32K/32K L1, 512K L2, MMU | ||||
PXA210/PXA250 | Applications processor, 7-stage pipeline | PXA210: 133 and 200 MHz, PXA250: 200, 300, and 400 MHz | Zaurus SL-5600, iPAQ H3900, Sony CLIÉ NX60, NX70V, NZ90 | |||
PXA255 | 32KB/32KB, MMU | 400 BogoMips @ 400 MHz; 371–533 MIPS @ 400 MHz[19] | Gumstix basix & connex, Palm Tungsten E2, Zaurus SL-C860, Mentor Ranger & Stryder, iRex ILiad | |||
PXA263 | 200, 300 and 400 MHz | Sony CLIÉ NX73V, NX80V | ||||
PXA26x | default 400 MHz, up to 624 MHz | Palm Tungsten T3 | ||||
PXA27x | Applications processor | 32 KB/32 KB, MMU | 800 MIPS @ 624 MHz | Gumstix verdex,"Trizeps-Modules" PXA270 COM, HTC Universal, HP hx4700, Zaurus SL-C1000, 3000, 3100, 3200, Dell Axim x30, x50, and x51 series, Motorola Q, Balloon3, Trolltech Greenphone, Palm TX, Motorola Ezx Platform A728, A780, A910, A1200, E680, E680i, E680g, E690, E895, Rokr E2, Rokr E6, Fujitsu Siemens LOOX N560, Toshiba Portégé G500, Trēo 650-755p, Zipit Z2, HP iPaq 614c Business Navigator. | ||
PXA800(E)F | ||||||
PXA3XX (codenamed "Monahans") | 32KB/32KB L1, TCM, MMU | 1000 MIPS @ 1.25 GHz | Samsung Omnia | |||
PXA900 | Blackberry 8700, Blackberry Pearl (8100) | |||||
IXC1100 | Control Plane Processor | |||||
IXP2400/IXP2800 | ||||||
IXP2850 | ||||||
IXP2325/IXP2350 | ||||||
IXP42x | NSLU2 IXP460/IXP465 | |||||
ARM11 | ARMv6 | ARM1136J(F)-S[20] | 8-stage pipeline, SIMD, Thumb, Jazelle DBX, (VFP), Enhanced DSP instructions | variable, MMU | 740 @ 532–665 MHz (i.MX31 SoC), 400–528 MHz | Texas Instruments OMAP2420 (Nokia E90, Nokia N93, Nokia N95, Nokia N82), Zune, BUGbase[2], Nokia N800, Nokia N810, Qualcomm MSM7200 (with integrated ARM926EJ-S Coprocessor@274 MHz, used in Eten Glofiish, HTC TyTN II, HTC Nike), Freescale i.MX31 (used in the original Zune 30gb, Toshiba Gigabeat S and Kindle DX), Freescale MXC300-30 (Nokia E63, Nokia E71, Nokia 5800, Nokia E51, Nokia 6700 Classic, Nokia 6120 Classic, Nokia 6210 Navigator, Nokia 6220 Classic, Nokia 6290, Nokia 6710 Navigator, Nokia 6720 Classic, Nokia E75, Nokia N97, Nokia N81), Qualcomm MSM7201A as seen in the HTC Dream, HTC Magic, Motorola Z6, HTC Hero, & Samsung SGH-i627 (Propel Pro),sony erricson xperia x10 mini pro, Qualcomm MSM7227 used in ZTE Link,[21][22] |
ARMv6T2 | ARM1156T2(F)-S | 9-stage pipeline, SIMD, Thumb-2, (VFP), Enhanced DSP instructions | variable, MPU | |||
ARMv6KZ | ARM1176JZ(F)-S | As ARM1136EJ(F)-S | variable, MMU+TrustZone | Apple iPhone (original and 3G), Apple iPod touch (1st and 2nd Generation), Conexant CX2427X, Motorola RIZR Z8, Motorola RIZR Z10, NVIDIA GoForce 6100[23]; Telechips TCC9101, TCC9201, TCC8900, Fujitsu MB86H60, Samsung S3C6410 (e.g. Samsung Omnia II, Samsung Moment, SmartQ 5), S3C6430[24] | ||
ARMv6K | ARM11 MPCore | As ARM1136EJ(F)-S, 1–4 core SMP | variable, MMU | Nvidia APX 2500 | ||
Family | Architecture Version | Core | Feature | Cache (I/D)/MMU | Typical MIPS @ MHz | In application |
Cortex | ARMv7-A | Cortex-A5 | VFP, NEON, Jazelle RCT and DBX, Thumb-2, 8-stage pipeline, 1–4 core SMP | variable (L1), MMU+TrustZone | up to 1500 (1.5 DMIPS/MHz) | "Sparrow" (ARM code name)[25][26][27] |
Cortex-A8 | VFP, NEON, Jazelle RCT, Thumb-2, 13-stage superscalar pipeline | variable (L1+L2), MMU+TrustZone | up to 2000 (2.0 DMIPS/MHz in speed from 600 MHz to greater than 1 GHz) | Texas Instruments OMAP3xxx series, SBM7000, Oregon State University OSWALD, Gumstix Overo Earth, Pandora, Apple iPhone 3GS, Apple iPod touch (3rd Generation), Apple iPad (Apple A4 processor), Apple iPhone 4 (Apple A4 processor), Archos 5, FreeScale i.MX51-SOC, BeagleBoard, Motorola Droid, Motorola Droid X, Palm Pre, Samsung Omnia HD, Samsung Wave S8500, Samsung i9000 Galaxy S, Sony Ericsson Satio, Touch Book, Nokia N900, Meizu M9, ZiiLABS ZMS-08 system on a chip. | ||
Cortex-A9 | Application profile, (VFP), (NEON), Jazelle RCT and DBX, Thumb-2, Out-of-order speculative issue superscalar | MMU+TrustZone | 2.5 DMIPS/MHz | |||
Cortex-A9 MPCore | As Cortex-A9, 1–4 core SMP | MMU+TrustZone | 10,000 DMIPS @ 2 GHz on Performance Optimized TSMC 40G (dual core) (2.5 DMIPS/MHz per core) | Texas Instruments OMAP4430/4440, ST-Ericsson U8500, Nvidia Tegra2, Qualcomm Snapdragon 8X72 | ||
ARMv7-R | Cortex-R4(F) | Embedded profile, Thumb-2, (FPU) | variable cache, MPU optional | 600 DMIPS @ 475 MHz | Broadcom is a user, TMS570 from Texas Instruments | |
ARMv7-ME | Cortex-M4 (codenamed "Merlin")[28] | Microcontroller profile, both Thumb and Thumb-2, FPU. Hardware MAC, SIMD and divide instructions. | MPU optional. | 1.25 DMIPS/MHz | Freescale Kinetis | |
ARMv7-M | Cortex-M3 | Microcontroller profile, Thumb-2 only. Hardware divide instruction. | no cache, MPU optional. | 125 DMIPS @ 100 MHz | Texas Instruments Stellaris microcontroller family, ST Microelectronics STM32, NXP Semiconductors LPC1700, Toshiba TMPM330FDFG, Ember's EM3xx Series, Atmel AT91SAM3, Europe Technologies EasyBCU, Energy Micro's EFM32, Actel's SmartFusion | |
ARMv6-M | Cortex-M0 (codenamed "Swift")[29] | Microcontroller profile, Thumb-2 subset (16-bit Thumb instructions & BL, MRS, MSR, ISB, DSB, and DMB). | No cache. | 0.9 DMIPS/MHz | NXP Semiconductors NXP LPC1100[30], Triad Semiconductor [31], Melfas[32], Chungbuk Technopark [33], Nuvoton [34], austriamicrosystems [35], Rohm [36] | |
Cortex-M1 | FPGA targeted, Microcontroller profile, Thumb-2 subset (16-bit Thumb instructions & BL, MRS, MSR, ISB, DSB, and DMB). | None, tightly coupled memory optional. | Up to 136 DMIPS @ 170 MHz[37] (0.8 DMIPS/MHz,[38] MHz achievable FPGA-dependent) | Actel ProASIC3, ProASIC3L, IGLOO and Fusion PSC devices, Altera Cyclone III, other FPGA products are also supported e.g. Synplicity | ||
Family | Architecture Version | Core | Feature | Cache (I/D)/MMU | Typical MIPS @ MHz | In application |
There has long been an "ARM Architecture Reference Manual", distinguishing interfaces that all ARM processors are required to support (such as instruction semantics) from implementation details that may vary. The architecture has evolved over time, and starting with the v7 architecture three "profiles" are defined: the "A" (application), "R" (realtime), and "M" (microcontroller) profiles.
Profiles are allowed to subset the architecture. For example the ARMv7-M profile is notable in that it supports only the Thumb processor mode and so only executes Thumb2 instructions, and the ARMv6-M profile is a subset of the ARMv7-M profile (supporting fewer instructions).
To keep the design clean, simple and fast, the original ARM implementation was hardwired without microcode, like the much simpler 8-bit 6502 processor used in prior Acorn microcomputers.
The ARM architecture includes the following RISC features:
To compensate for the simpler design, compared with contemporary processors like the Intel 80286 and Motorola 68020, some additional design features were used:
The conditional execution feature (called predication) is implemented with a 4-bit condition code selector (the predicate) on every instruction; one of the four-bit codes is reserved as an "escape code" to specify certain unconditional instructions, but nearly all common instructions are conditional. Most CPU architectures only have condition codes on branch instructions.
This cuts down significantly on the encoding bits available for displacements in memory access instructions, but on the other hand it avoids branch instructions when generating code for small if
statements. The standard example of this is the subtraction-based Euclidean algorithm:
In the C programming language, the loop is:
bool do_cycle = false; do { do_cycle = i != j; if (i > j) i -= j; else j -= i; } while (do_cycle)
In ARM assembly, the loop is:
loop CMP Ri, Rj ; set condition "NE" if (i != j), ; "GT" if (i > j), ; or "LT" if (i < j) SUBGT Ri, Ri, Rj ; if "GT" (greater than), i = i-j; SUBLT Rj, Rj, Ri ; if "LT" (less than), j = j-i; BNE loop ; if "NE" (not equal), then loop
which avoids the branches around the then
and else
clauses. Note that if Ri
and Rj
are equal then neither of the SUB
instructions will be executed, optimising out the need for a conditional branch to implement the while
check at the top of the loop, for example had SUBLE
(less than or equal) been used.
One of the ways that Thumb code provides a more dense encoding is to remove that four bit selector from non-branch instructions.
Another feature of the instruction set is the ability to fold shifts and rotates into the "data processing" (arithmetic, logical, and register-register move) instructions, so that, for example, the C statement
a += (j << 2);
could be rendered as a single-word, single-cycle instruction on the ARM.
ADD Ra, Ra, Rj, LSL #2
This results in the typical ARM program being denser than expected with fewer memory accesses; thus the pipeline is used more efficiently. Even though the ARM runs at what many would consider to be low speeds, it nevertheless competes quite well with much more complex CPU designs.
The ARM processor also has some features rarely seen in other RISC architectures, such as PC-relative addressing (indeed, on the ARM the PC is one of its 16 registers) and pre- and post-increment addressing modes.
Another item of note is that the ARM has been around for a while, with the instruction set increasing somewhat over time. Some early ARM processors (prior to ARM7TDMI), for example, have no instruction to store a two-byte quantity, thus, strictly speaking, for them it's not possible to generate efficient code that would behave the way one would expect for C objects of type "int16_t".
The ARM7 and earlier implementations have a three stage pipeline; the stages being fetch, decode, and execute. Higher performance designs, such as the ARM9, have deeper pipelines: Cortex-A8 has thirteen stages. Additional implementation changes for higher performance include a faster adder, and more extensive branch prediction logic. The difference between the ARM7DI and ARM7DMI cores, for example, was an improved multiplier (hence the added "M").
The architecture provides a non-intrusive way of extending the instruction set using "coprocessors" which can be addressed using MCR, MRC, MRRC, MCRR, and similar instructions. The coprocessor space is divided logically into 16 coprocessors with numbers from 0 to 15, coprocessor 15 (cp15) being reserved for some typical control functions like managing the caches and MMU operation (on processors that have one).
In ARM-based machines, peripheral devices are usually attached to the processor by mapping their physical registers into ARM memory space or into the coprocessor space or connecting to another device (a bus) which in turn attaches to the processor. Coprocessor accesses have lower latency so some peripherals (for example XScale interrupt controller) are designed to be accessible in both ways (through memory and through coprocessors). In other cases, chip designers only integrate hardware using the coprocessor mechanism. For example, an image processing engine might be a small ARM7TDMI core combined with a coprocessor that has specialized operations to support a specific set of HDTV transcoding primitives.
To improve compiled code-density, processors from the ARM7TDMI on have featured the Thumb instruction set state. (The "T" in "TDMI" indicates the Thumb feature.) When in this state, the processor executes the Thumb instruction set, a variable-length instruction set providing 32-bit and 16-bit instructions. Most of the Thumb instructions are directly mapped to normal ARM instructions. The space-saving comes from making some of the instruction operands implicit and limiting the number of possibilities compared to the ARM instructions executed in the ARM instruction set state.
In Thumb, the 16-bit opcodes have less functionality. For example, only branches can be conditional, and many opcodes are restricted to accessing only half of all of the CPU's general purpose registers. The shorter opcodes give improved code density overall, even though some operations require extra instructions. In situations where the memory port or bus width is constrained to less than 32 bits, the shorter Thumb opcodes allow increased performance compared with 32-bit ARM code, as less program code may need to be loaded into the processor over the constrained memory bandwidth.
Embedded hardware, such as the Game Boy Advance, typically have a small amount of RAM accessible with a full 32-bit datapath; the majority is accessed via a 16 bit or narrower secondary datapath. In this situation, it usually makes sense to compile Thumb code and hand-optimise a few of the most CPU-intensive sections using full 32-bit ARM instructions, placing these wider instructions into the 32-bit bus accessible memory.
The first processor with a Thumb instruction decoder was the ARM7TDMI. All ARM9 and later families, including XScale, have included a Thumb instruction decoder.
All modern ARM processors include hardware debugging facilities; without them, software debuggers could not perform basic operations like halting, stepping, and breakpointing of code starting from reset. These facilities are built using JTAG support, though some newer cores optionally support ARM's own two-wire "SWD" protocol. In ARM7TDMI cores, the "D" represented JTAG debug support, and the "I" represented presence of an "EmbeddedICE" debug module. For ARM7 and ARM9 core generations, EmbeddedICE over JTAG was a de-facto debug standard, although it was not architecturally guaranteed.
The ARMv7 architecture defines basic debug facilities at an architectural level. These include breakpoints, watchpoints, and instruction execution in a "Debug Mode"; similar facilities were also available with EmbeddedICE. Both "halt mode" and "monitor" mode debugging are supported. The actual transport mechanism used to access the debug facilities is not architecturally specified, but implementations generally include JTAG support.
There is a separate ARM "CoreSight" debug architecture, which is not architecturally required by ARMv7 processors.
To improve the ARM architecture for digital signal processing and multimedia applications, a few new instructions were added to the set.[40] These are signified by an "E" in the name of the ARMv5TE and ARMv5TEJ architectures. E-variants also imply T,D,M and I.
The new instructions are common in digital signal processor architectures. They are variations on signed multiply-accumulate, saturated add and subtract, and count leading zeros.
Jazelle is a technique that allows Java Bytecode to be executed directly in the ARM architecture as a third execution state (and instruction set) alongside the existing ARM and Thumb-mode. Support for this state is signified by the "J" in the ARMv5TEJ architecture, and in ARM9EJ-S and ARM7EJ-S core names. Support for this state is required starting in ARMv6 (except for the ARMv7-M profile), although newer cores only include a trivial implementation that provides no hardware acceleration.
Thumb-2 technology made its debut in the ARM1156 core, announced in 2003. Thumb-2 extends the limited 16-bit instruction set of Thumb with additional 32-bit instructions to give the instruction set more breadth. A stated aim for Thumb-2 is to achieve code density similar to Thumb with performance similar to the ARM instruction set on 32-bit memory. In ARMv7 this goal can be said to have been met.
Thumb-2 extends both the ARM and Thumb instruction set with yet more instructions, including bit-field manipulation, table branches, and conditional execution. A new "Unified Assembly Language" (UAL) supports generation of either Thumb-2 or ARM instructions from the same source code; versions of Thumb seen on ARMv7 processors are essentially as capable as ARM code (including the ability to write interrupt handlers). This requires a bit of care, and use of a new "IT" (if-then) instruction, which permits up to four successive instructions to execute based on a tested condition. When compiling into ARM code this is ignored, but when compiling into Thumb-2 it generates an actual instruction. For example
; if (r0 == r1) CMP r0, r1 ITE EQ ; ARM: no code ... Thumb: IT instruction ; then r0 = r2; MOVEQ r0, r2 ; ARM: conditional; Thumb: condition via ITE 'T' (then) ; else r0 = r3; MOVNE r0, r3 ; ARM: conditional; Thumb: condition via ITE 'E' (else) ; recall that the Thumb MOV instruction has no bits to encode "EQ" or "NE"
All ARMv7 chips support the Thumb-2 instruction set. Some chips, such as the Cortex-M3, support only the Thumb-2 instruction set. Other chips in the Cortex and ARM11 series support both "ARM instruction set mode" and "Thumb-2 instruction set mode".[41][42][43]
ThumbEE, also known as Thumb-2EE, and marketed as Jazelle RCT (Runtime Compilation Target), was announced in 2005, first appearing in the Cortex-A8 processor. ThumbEE is a fourth processor mode, making small changes to the Thumb-2 extended Thumb instruction set. These changes make the instruction set particularly suited to code generated at runtime (e.g. by JIT compilation) in managed Execution Environments. ThumbEE is a target for languages such as Limbo, Java, C#, Perl and Python, and allows JIT compilers to output smaller compiled code without impacting performance.
New features provided by ThumbEE include automatic null pointer checks on every load and store instruction, an instruction to perform an array bounds check, access to registers r8-r15 (where the Jazelle/DBX Java VM state is held), and special instructions that call a handler.[44] Handlers are small sections of frequently called code, commonly used to implement a feature of a high level language, such as allocating memory for a new object. These changes come from repurposing a handful of opcodes, and knowing the core is in the new ThumbEE mode.
The Advanced SIMD extension, marketed as NEON technology, is a combined 64- and 128-bit single instruction multiple data (SIMD) instruction set that provides standardized acceleration for media and signal processing applications. NEON can execute MP3 audio decoding on CPUs running at 10 MHz and can run the GSM AMR (Adaptive Multi-Rate) speech codec at no more than 13 MHz. It features a comprehensive instruction set, separate register files and independent execution hardware. NEON supports 8-, 16-, 32- and 64-bit integer and single-precision (32-bit) floating-point data and operates in SIMD operations for handling audio and video processing as well as graphics and gaming processing. In NEON, the SIMD supports up to 16 operations at the same time.
|
VFP (Vector Floating Point) technology is a coprocessor extension to the ARM architecture. It provides low-cost single-precision and double-precision floating-point computation fully compliant with the ANSI/IEEE Std 754-1985 Standard for Binary Floating-Point Arithmetic. VFP provides floating-point computation suitable for a wide spectrum of applications such as PDAs, smartphones, voice compression and decompression, three-dimensional graphics and digital audio, printers, set-top boxes, and automotive applications. The VFP architecture also supports execution of short vector instructions but these operate on each vector element sequentially and thus do not offer the performance of true SIMD (Single Instruction Multiple Data) parallelism. This mode can still be useful in graphics and signal-processing applications, however, as it allows a reduction in code size and instruction fetch and decode overhead.
Other floating-point and/or SIMD coprocessors found in ARM-based processors include FPA, FPE, iwMMXt. They provide some of the same functionality as VFP but are not opcode-compatible with it.
The Security Extensions, marketed as TrustZone Technology, is found in ARMv6KZ and later application profile architectures. It provides a low cost alternative to adding an additional dedicated security core to a SoC, by providing two virtual processors backed by hardware based access control. This enables the application core to switch between two states, referred to as worlds (to reduce confusion with other names for capability domains), in order to prevent information from leaking from the more trusted world to the less trusted world. This world switch is generally orthogonal to all other capabilities of the processor, thus each world can operate independently of the other while using the same core. Memory and peripherals are then made aware of the operating world of the core and may use this to provide access control to secrets and code on the device. Typical applications of TrustZone Technology are to run a rich operating system in the less trusted world, and smaller security-specialized code in the more trusted world (known as TrustZone Software, a TrustZone optimized version of the Trusted Foundations(TM) Software developed by Trusted Logic), allowing much tighter Digital Rights Management for controlling the use of media on ARM-based devices,[45] and preventing any unapproved use of the device.
In practice, since the specific implementation details of TrustZone are proprietary and have not been publicly disclosed for review, it is unclear what level of assurance is provided for a given threat model.
As of ARMv6, the ARM architecture supports no-execute page protection, which is referred to as XN, for eXecute Never.[46]
ARM Ltd does not manufacture and sell CPU devices based on its own designs, but rather, licenses the processor architecture to interested parties. ARM offers a variety of licensing terms, varying in cost and deliverables. To all licensees, ARM provides an integratable hardware description of the ARM core, as well as complete software development toolset (compiler, debugger, SDK), and the right to sell manufactured silicon containing the ARM CPU. Fabless licensees, who wish to integrate an ARM core into their own chip design, are usually only interested in acquiring a ready-to-manufacture verified IP core. For these customers, ARM delivers a gate netlist description of the chosen ARM core, along with an abstracted simulation model and test programs to aid design integration and verification. More ambitious customers, including integrated device manufacturers (IDM) and foundry operators, choose to acquire the processor IP in synthesizable RTL (Verilog) form. With the synthesizable RTL, the customer has the ability to perform architectural level optimizations and extensions. This allows the designer to achieve exotic design goals not otherwise possible with an unmodified netlist (high clock speed, very low power consumption, instruction set extensions, etc.). While ARM does not grant the licensee the right to resell the ARM architecture itself, licensees may freely sell manufactured product (chip devices, evaluation boards, complete systems, etc.). Merchant foundries can be a special case; not only are they allowed to sell finished silicon containing ARM cores, they generally hold the right to remanufacture ARM cores for other customers.
Like most IP vendors, ARM prices its IP based on perceived value. In architectural terms, the lower performance ARM cores command a lower license cost than the higher performance cores. In terms of silicon implementation, a synthesizable core is more expensive than a hard macro (blackbox) core. Complicating price matters, a merchant foundry who holds an ARM license (such as Samsung and Fujitsu) can offer reduced licensing costs to its fab customers. In exchange for acquiring the ARM core through the foundry's in-house design services, the customer can reduce or eliminate payment of ARM's upfront license fee. Compared to dedicated semiconductor foundries (such as TSMC and UMC) without in-house design services, Fujitsu/Samsung charge 2 to 3 times more per manufactured wafer. For low to mid volume applications, a design service foundry offers lower overall pricing (through subsidization of the license fee). For high volume mass produced parts, the long term cost reduction achievable through lower wafer pricing reduces the impact of ARM's NRE (Non-Recurring Engineering) costs, making the dedicated foundry a better choice.
Many semiconductor or IC design firms hold ARM licenses; Analog Devices, Atmel, Broadcom, Cirrus Logic, Energy Micro, Faraday Technology, Freescale, Fujitsu, Intel (through its settlement with Digital Equipment Corporation), IBM, Infineon Technologies, Nintendo, NXP Semiconductors, OKI, Qualcomm, Samsung, Sharp, STMicroelectronics, Texas Instruments and VLSI are some of the many companies who have licensed the ARM in one form or another.
ARM's 2006 annual report and accounts state that royalties totalling £88.7 million ($164.1 million) were the result of licensees shipping 2.45 billion units.[47] This is equivalent to £0.036 ($0.067) per unit shipped. However, this is averaged across all cores, including expensive new cores and inexpensive older cores.
In the same year ARM's licensing revenues for processor cores were £65.2 million (US$119.5 million),[48] in a year when 65 processor licenses were signed,[49] an average of £1 million ($1.84 million) per license. Again, this is averaged across both new and old cores.
Given that ARM's 2006 income from processor cores was approximately 60% from royalties and 40% from licenses, ARM makes the equivalent of £0.06 ($0.11) per unit shipped including both royalties and licenses. However, as one-off licenses are typically bought for new technologies, unit sales (and hence royalties) are dominated by more established products. Hence, the figures above do not reflect the true costs of any single ARM product.
The ARM architecture is supported by Unix and Unix-like operating systems Linux, BSD, QNX, Plan 9 from Bell Labs, Inferno, Solaris, iOS, WebOS and Android.
The following Linux distributions support ARM processors:
The following BSD derivatives support ARM processors:
|